Aggregating Local Context for Accurate Scene Text Detection
نویسندگان
چکیده
Scene text reading continues to be of interest for many reasons including applications for the visually impaired and automatic image indexing systems. Here we propose a novel end-to-end scene text detection algorithm. First, for identifying text regions we design a novel Convolutional Neural Network (CNN) architecture that aggregates local surrounding information for cascaded, fast and accurate detection. The local information serves as context and provides rich cues to distinguish text from background noises. In addition, we designed a novel grouping algorithm on top of detected character graph as well as a text line refinement step. Text line refinement consists of a text line extension module, together with a text line filtering and regression module. Jointly they produce accurate oriented text line bounding box. Experiments show that our method achieved state-of-the-art performance in several benchmark data sets: ICDAR 2003 (IC03), ICDAR 2013 (IC13) and Street View Text (SVT).
منابع مشابه
An efficient method for cloud detection based on the feature-level fusion of Landsat-8 OLI spectral bands in deep convolutional neural network
Cloud segmentation is a critical pre-processing step for any multi-spectral satellite image application. In particular, disaster-related applications e.g., flood monitoring or rapid damage mapping, which are highly time and data-critical, require methods that produce accurate cloud masks in a short time while being able to adapt to large variations in the target domain (induced by atmospheric c...
متن کاملA Scene-based Model of Word Prediction
This paper proposes a semantico-statistical model of word prediction which uses local context of each scene in a text. Scenes are text segments, each of which displays local context. The occurrence of a word inside a scene is predicted by its local context. On the other hand, a text (de ned as a sequence of scenes) is less constrained by word occurrence, so word prediction becomes more di cult....
متن کاملCompressed-Sampling-Based Image Saliency Detection in the Wavelet Domain
When watching natural scenes, an overwhelming amount of information is delivered to the Human Visual System (HVS). The optic nerve is estimated to receive around 108 bits of information a second. This large amount of information can’t be processed right away through our neural system. Visual attention mechanism enables HVS to spend neural resources efficiently, only on the selected parts of the...
متن کاملRecognition of Characters from Streaming Videos
Over the past few years, Video has become one of the prime source for recreation, be it Television or Internet. Television brings a whole lot of professionally produced video content (International or local, sports or educational, news or entertainment) to the home for the masses. Similarly, Internet hosts a whole lot of video content uploaded by other users. Understanding the context of the vi...
متن کاملAccurate video text detection through classification of low and high contrast images
Detection of both scene text and graphic text in video images is gaining popularity in the area of information retrieval for efficient indexing and understanding the video. In this paper, we explore a new idea of classifying low contrast and high contrast video images in order to detect accurate boundary of the text lines in video images. In this work, high contrast refers to sharpness while lo...
متن کامل